Introduction

This HTML report details a multiomic exploration of m6A-related genes in breast cancer cell lines from the Broad Institute Depmap datasets using the depmap R package, cancer dependency data described in further detail by Tsherniak, Aviad, et al. “Defining a cancer dependency map.” Cell 170.3 (2017): 564-576.. The DepMap data enables a broad view of gene essentiality via CRISPR screens, as well as gene copy number, mutation calls, and normalized and batch corrected transcript expression for a large panel of cancer cell lines, including breast cancer.

This report will analyze and explore the differences between m6A-related genes within Depmap breast cancer data for the following omics modalities:

  1. Gene Expression
  2. CRISPR Dependency
  3. Expression vs. Dependency
  4. Copy Number
  5. Methylation
  6. Mutations Calls

Summary of Results

The summarized results of this report are show below. For more detailed explanation how these conclusions were derived, please go to the following sections in the report.

  • Gene Expression
    • A notable expression pattern emerges for ER-negative/HER2-negative cell lines, with higher expression of RMB15, METTL16, YTHDC2, METTL14, IGF2BP2, and IGF2BP3 relative to other subtypes.
    • Some genes, such as HNRNPC and EIF4G2, are universally highly expressed in nearly all lines.
  • CRISPR Dependency
    • CRISPR knockout of several m6A-related genes (e.g., ABCF1, RBMX, EIF3A, HNRNPC, VIRMA (KIAA1429)) is highly lethal in DepMap breast cancer lines.
    • In contrast, YTHDF1, YTHDC2, and FTO knockouts appear largely non-lethal.
    • Overall, m6A-related genes show higher lethality than the global average dependency scores.
  • Expression vs. Dependency
    • Certain genes (e.g., ABCF1, ALKBH5, CBLL1) show a direct correlation between higher expression and stronger dependency.
  • Copy Number
    • IGF2BP1 exhibits the highest average copy number among m6A-related genes, with substantial variation across breast cancer lines.
    • CNV of most other m6A-related genes remain close to the diploid reference (log2 ~ 1).
  • Methylation
    • One site of the geneIGF2BP2 is highly methylation in about half of the breast cancer samples, but this methylation does not correlate with any relevant cell line annotations
    • Few other m6A-related genes appear to be methylated in the RRBS assay
  • Mutations
    • Missense mutations dominate among m6A-related genes, and VIRMA (KIAA1429) is most frequently mutated, though typically not annotated as damaging based on COSMIC mutation annotations.
    • C-to-T (and G-to-A) transitions are more common in metastatic lines, possibly reflecting APOBEC-associated mutational processes.
    • Within metastatic breast cancer lines, RBM15 mutations occur slightly more often than other m6A-related gene mutations.

Query Depmap Data

We use the depmap R package to access relevant DepMap datasets:

crispr: CRISPR-Cas9 essentiality (dependency) scores
copyNumber: copy number alterations per gene/cell line
TPM: transcript expression levels (RNA-seq)
mutationCalls: mutation data
metadata: cell-line metadata (including disease subtype).
## create ExperimentHub query object
eh <- ExperimentHub()
query(eh, "depmap")
#> ExperimentHub with 82 records
#> # snapshotDate(): 2024-10-24
#> # $dataprovider: Broad Institute
#> # $species: Homo sapiens
#> # $rdataclass: tibble
#> # additional mcols(): taxonomyid, genome, description,
#> #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> #   rdatapath, sourceurl, sourcetype 
#> # retrieve records with, e.g., 'object[["EH2260"]]' 
#> 
#>            title             
#>   EH2260 | rnai_19Q1         
#>   EH2261 | crispr_19Q1       
#>   EH2262 | copyNumber_19Q1   
#>   EH2263 | RPPA_19Q1         
#>   EH2264 | TPM_19Q1          
#>   ...      ...               
#>   EH7555 | copyNumber_22Q2   
#>   EH7556 | TPM_22Q2          
#>   EH7557 | mutationCalls_22Q2
#>   EH7558 | metadata_22Q2     
#>   EH7559 | achilles_22Q2
metadata <- eh[["EH7558"]]
crispr <- eh[["EH7554"]]
copyNumber <- eh[["EH7555"]]
TPM <- eh[["EH7556"]]
mutationCalls <- eh[["EH7557"]]

Metadata

We subset DepMap metadata for breast cancer lines using the keywords ("breast cancer") across relevant columns and display the resulting set of breast cancer cell line samples in a data table for easy scanning.

metadata %>%
  dplyr::filter(grepl("BREAST", cell_line)) %>%
  dplyr::select(-contains("issues"), -contains("stripped"), -contains("WTSI"),
                -c(aliases, cosmic_id, source)) %>%
  as.data.frame() -> bc_metadata

DT::datatable(bc_metadata)

Gene Expression

We pivot from long to wide, building a matrix of TPM expression with genes as rows and cell lines as columns. Then we optionally annotate columns with metadata (metastatic status, disease subtype, etc.) and generate a pheatmap to observe any potential clustering of samples by expression.

Interpretation: Expression of for m6A-related genes varies by genes across all breast cancer cell lines. Some genes are universally highly expressed such as HNRNPC, HNRNPA2B1, EIF4G2, G3BP1, while other genes display more moderate expression. Most interestingly, we observe that there appears to be a expression pattern for ER-negative cancer cell lines, specifically for genes RMB15, METTL16, YTHDC2, METTL14, IGF2BP2 and IGF2BP3. I should stress that this data is batch-normalized by the Depmap consortium, and therefore we shouldn’t expect that this pattern is a technical artifact.

TPM %>%
  dplyr::filter(cell_line %in% bc_metadata$cell_line,
                gene_name %in% m6A_genes$gene_name,
                !is.na(cell_line)) %>%
  as.data.frame() -> tpm_df

tpm_df %>%
  dplyr::select(gene_name, cell_line, rna_expression) %>%
  tidyr::pivot_wider(names_from = cell_line,
                     values_from = rna_expression) %>%
  tibble::column_to_rownames(var = "gene_name") %>%
  t() %>% as.data.frame() %>%
  tibble::rownames_to_column(var = "cell_line") %>%
  dplyr::left_join(bc_metadata, by = "cell_line") %>%
  as.data.frame() -> tpm_ann_df

data.frame(status = tpm_ann_df$primary_or_metastasis,
           subtype = tpm_ann_df$subtype_disease,
           sample_site = tpm_ann_df$sample_collection_site,
           TN_status = tpm_ann_df$lineage_sub_subtype,
           row.names = tpm_ann_df$cell_line) %>% 
  dplyr::mutate(TN_status = if_else(is.na(TN_status), "unknown", TN_status),
                subtype = if_else(is.na(subtype), "unknown", TN_status)) %>% 
  as.data.frame() -> sample_col

m6A_df %>% 
  dplyr::arrange(gene_name) %>%
  tibble::column_to_rownames(var = "gene_name") %>%
  as.data.frame() -> sample_row

tpm_ann_df %>%
  dplyr::select(cell_line:RBM15B) %>%
  tibble::column_to_rownames(var = "cell_line") %>%
  t() %>%
  pheatmap::pheatmap(
    annotation_col = sample_col,
    annotation_row = sample_row,
    fontsize = 7,
    border_color = NA,
    show_colnames = FALSE,
    main = paste0("log10 gene expression for m6A-related genes in breast cancer cell lines"))

CRISPR Dependency

We visualize how dependent these cell lines are on each m6A-related gene using plotly. The dashed lines represent:

Red line: the mean dependency across all m6A-related genes in just breast cancer lines.
Green line: the mean dependency across all lines and all genes in DepMap (global average).

Interpretation: CRISPR knockout of m6A-related genes ABCF1, RBMX, EIF3A, HNRNPC, VIRMA were strongly lethal to breast cancer lines, likely because these genes are crucial for cell viability, whereas CRISPR knockout of YTHDF1, YTHDC2 and FTO appear to be non-lethal to breast cancer lines, even displaying a mild proliferative effect. Additionally, the CRISPR deletion of m6A-related genes appears to be more lethal on average than the global average dependency.

crispr %>%
  dplyr::filter(cell_line %in% bc_metadata$cell_line,
                gene_name %in% m6A_genes$gene_name,
                !is.na(cell_line)) %>%
  as.data.frame() -> crispr_df

crispr %>% 
  dplyr::filter(!is.na(dependency)) %>%
  as.data.frame() -> crispr_global_dependency
  
crispr_df %>%
  dplyr::filter(!is.na(dependency)) %>%
  dplyr::group_by(gene_name) %>%
  dplyr::summarize(mean_dependency = mean(dependency, na.remove = TRUE)) %>% 
  as.data.frame() -> crispr_gene_mean_dep

crispr_df %>%
  dplyr::left_join(crispr_gene_mean_dep, by = "gene_name") %>%
  dplyr::arrange(desc(mean_dependency)) %>%
  dplyr::mutate(gene_name = factor(gene_name, levels = unique(gene_name))) %>%
  dplyr::select(-c(gene, cell_line)) %>% 
  dplyr::left_join(metadata, by = "depmap_id") %>%
  as.data.frame() -> crispr_gene_mean_dep_merged

crispr_gene_mean_dep_merged %>%
  dplyr::mutate(subtype_disease = if_else(is.na(subtype_disease), "unknown", subtype_disease)) %>% 
  ggplot(aes(x = gene_name, y = dependency, color = subtype_disease)) +
  geom_point(size = 0.75) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  geom_hline(yintercept = mean(crispr_gene_mean_dep$mean_dependency, na.remove = TRUE),
             color = "darkred", linetype = "dashed") +
  geom_hline(yintercept = mean(crispr_global_dependency$dependency, na.remove = TRUE),
             color = "darkgreen", linetype = "dashed") +
  xlab("m6A-related-related genes") +
  ggtitle("m6A-related genes ranked by mean dependency score") -> p1
ggplotly(p1)

We reshape (pivot_wider) the data so that rows = genes, columns = breast cancer cell lines, and values = CRISPR dependency scores. Then we call pheatmap to visualize how essential each gene is across multiple lines in a grid layout.

Interpretation: We don’t observe a strong pattern in dependency scores in m6A-related genes across any of the annotations within Depmap breast cancer cell lines

crispr_df %>%
  dplyr::select(gene_name, cell_line, dependency) %>%
  tidyr::pivot_wider(names_from = cell_line,
                     values_from = dependency) %>%
  tibble::column_to_rownames(var = "gene_name") %>%
  t() %>% as.data.frame() %>%
  tibble::rownames_to_column(var = "cell_line") %>%
  dplyr::left_join(bc_metadata, by = "cell_line") %>%
  as.data.frame() -> crispr_ann_df

data.frame(status = crispr_ann_df$primary_or_metastasis,
           subtype = crispr_ann_df$subtype_disease,
           sample_site = crispr_ann_df$sample_collection_site,
           TN_status = crispr_ann_df$lineage_sub_subtype,
           row.names = crispr_ann_df$cell_line) %>% 
  dplyr::mutate(TN_status = if_else(is.na(TN_status), "unknown", TN_status),
                subtype = if_else(is.na(subtype), "unknown", TN_status)) %>% 
  as.data.frame() -> sample_col

m6A_df %>% 
  dplyr::arrange(gene_name) %>%
  tibble::column_to_rownames(var = "gene_name") %>%
  as.data.frame() -> sample_row

crispr_ann_df %>%
  dplyr::select(cell_line:YTHDF2) %>%
  tibble::column_to_rownames(var = "cell_line") %>%
  t() %>%
  pheatmap::pheatmap(
    annotation_col = sample_col,
    annotation_row = sample_row,
    show_colnames = FALSE,
    border_color = NA,
    fontsize = 7,
    main = paste0("Dependency scores for m6A-related genes in breast cancer cell lines"))

Gene Expression vs CRISPR Dependency

We join CRISPR dependency data with expression (TPM) for the same gene/cell line pairs, then plot expression vs. dependency. The stat_cor(method = "spearman") call adds a correlation value to see if higher m6A-related gene expression correlates with lower (or higher) essentiality.

Below is a concise biological rationale that ties together why the level of gene expression and the dependency score from CRISPR knockout screens (as reported in DepMap) can be related, while also explaining why the relationship is not always straightforward:

  1. Active Use and Importance in Cellular Function
  • If a gene is highly expressed in a given cell type, it often (though not always) indicates that the cell relies on its protein product for crucial processes.
  • As a result, CRISPR-mediated knockout of highly expressed, functionally important genes may yield strong dependency scores (i.e., cells show a significant loss in viability when the gene is knocked out).
  1. Gene Expression Does Not Always Predict Dependency
  • Some genes are highly expressed due to roles in basic cell maintenance (housekeeping) yet are still easily compensated for if knocked out (for example, due to redundant pathways or isoforms). These would show high expression but weaker CRISPR dependencies.
  • Conversely, some essential genes may be expressed at moderate or even low levels but are still absolutely required for a specific, critical pathway. Knocking these genes out leads to strong dependencies despite modest baseline expression.
  1. Context-Dependent Vulnerabilities
  • Tumor cells often develop specific vulnerabilities (“oncogene addiction”) based on mutations or epigenetic changes. In these scenarios, a gene’s expression level may be elevated or otherwise deregulated as a result of the tumor’s rewiring.
  • If the tumor becomes particularly reliant on that elevated gene’s protein function, the dependency score in CRISPR screens can be quite high—reflecting a synthetic lethal or oncogene-addicted phenotype.
  • In a different lineage or tumor with alternative genetic backgrounds, the same gene might not be essential even if expressed, due to compensatory signaling pathways.
  1. Regulatory Complexities
  • Genes can be under various layers of transcriptional and post-transcriptional regulation (including miRNAs, lncRNAs, or regulatory proteins). A strongly expressed transcript could be subject to rapid protein turnover, or vice versa.
  • Hence, the connection between mRNA abundance and functional protein activity (the true driver of cell-essential phenotypes) can be indirect.

Interpretation: We observe a significant direct correlation between m6A-related gene expression and CRISPR dependency scores for genes ABCF1, ALKBH5, and CBLL1. This suggests that these m6A-related genes in particular could have a critical role in core cell processes in breast cancer, or be evidence of “oncogene addiction” / “tumor dependency” processes and/or lack of redundant pathways compensating for the knockout of these genes. As such, these genes are strong candidates for targeted therapies or further functional investigation.

tpm_df %>%
  dplyr::select(-c(gene, cell_line, entrez_id)) %>%
  dplyr::left_join(crispr_df, by = c("depmap_id", "gene_name")) %>%
  dplyr::filter(!is.na(cell_line)) %>%
  ggplot(aes(x = dependency, y = rna_expression, colour = gene_name)) +
    geom_point() +
    stat_smooth(method = "lm", se = TRUE, formula = y ~ poly(x, 1, raw = TRUE),
                color = "black", linetype = "dashed", fill = "gray") +
    ggpubr::stat_cor(method = "spearman", label.x = -2, label.y = 0) +
    ggtitle(paste0("Correlation between m6A-related gene expression and ",
                   "CRISPR\ndependency scores in Depmap breast cancer cell lines")) +
    facet_wrap(~gene_name, ncol = 3) +
    theme(
      plot.title = element_text(size = 16),
      strip.text = element_text(size = 12))

Copy Number

We check log2 copy-number values per gene, shown below. The dashed line indicates a diploid reference. Higher or lower values suggest possible amplification or deletion for m6A-related genes in breast cancer cells. These genes are arranged by descending average copy number. For more information how Depmap CNV is calculated, please refer to the DepMap documentation.

Interpretation: IGF2BP1 average copy number is highest in breast cancer, but also displays the greatest variation. The other m6A-related genes display variation, but their mean CNV is close to 1

copyNumber %>%
  dplyr::filter(cell_line %in% bc_metadata$cell_line,
                gene_name %in% m6A_genes$gene_name,
                !is.na(cell_line)) %>%
  as.data.frame() -> copy_number_df

copy_number_df %>%
  dplyr::arrange(desc(log_copy_number)) %>% 
  dplyr::mutate(gene_name = factor(gene_name, levels = unique(gene_name))) %>%
  ggplot(aes(x = gene_name, y = log_copy_number, fill = gene_name)) +
    geom_violin() +
    geom_boxplot(width = 0.25) +
    theme_classic() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          legend.position = "none") +
    geom_hline(yintercept = 1, color = "black", linetype = "dashed") +
    ggtitle(paste0("Log2 copy-number for m6A-related genes in breast cancer cell lines"))

RRBS Methylation

In order to explore epigenetic modulation of m6A-related genes, we assay the normalized methylation scores of these genes in reduced representation bisulfite sequencing (RRBS) data. For more on this information, please refer to the Depmap page.

Note: this is DNA methylation and not RNA methylation data!

Interpretation: We observe that the vast majority of m6A-related genes do not display methylation in this assay within Depmap breast cancer cell lines, with the exception of IGF2BP2, and to a lesser extent, other IGF2 genes. However, the strong methylation of IGF2 genes in roughly half of the breast cancer cell line samples does not appear to correlate with tumor status, ER or HER2 stratification, or other annotations.

readr::read_csv(
  paste0("./depmap_data/Methylation_(1kb_upstream_TSS)_subsetted_NAsdropped.csv")) %>%
  dplyr::rename(depmap_id = names(.)[1]) %>%
  dplyr::filter(depmap_id %in% bc_metadata$depmap_id) %>%
  tibble::column_to_rownames(var = "depmap_id") %>%
  t() %>% as.data.frame() %>% 
  tibble::rownames_to_column(var = "methylation_site") %>%
  dplyr::mutate(gene_name =  gsub("_.*", "", methylation_site)) %>%
  dplyr::filter(gene_name %in% m6A_genes$gene_name) %>%
  dplyr::select(-gene_name) %>%
  tibble::column_to_rownames(var = "methylation_site") %>%
  as.data.frame() -> meth_counts

meth_counts %>%
  t() %>% as.data.frame() %>%
  tibble::rownames_to_column(var = "depmap_id") %>%
  dplyr::left_join(bc_metadata, by = "depmap_id") %>%
  as.data.frame() -> meth_ann_df

data.frame(status = meth_ann_df$primary_or_metastasis,
           subtype = meth_ann_df$subtype_disease,
           sample_site = meth_ann_df$sample_collection_site,
           TN_status = meth_ann_df$lineage_sub_subtype,
           row.names = meth_ann_df$depmap_id) %>%
  dplyr::mutate(TN_status = if_else(is.na(TN_status), "unknown", TN_status),
                subtype = if_else(is.na(subtype), "unknown", TN_status)) %>%
  as.data.frame() -> sample_col

meth_counts %>%
  pheatmap::pheatmap(
    annotation_col = sample_col,
    show_colnames = FALSE,
    border_color = NA,
    fontsize = 7,
    main = paste0("Normalized methylation scores for m6A-related ",
                  "genes\nin breast cancer cell lines"))

Mutations

We look at mutationCalls table, combine with metadata, and keep only those entries with a m6A-related gene in breast cancer cell lines.

mutationCalls %>%
  dplyr::left_join(bc_metadata, by = "depmap_id") %>%
  dplyr::filter(cell_line %in% bc_metadata$cell_line,
                gene_name %in% c(m6A_genes$gene_name, "KIAA1429"),
                !is.na(cell_line)) %>%
  as.data.frame() -> mutation_calls_df

We create a balloon plot to see how mutation classification (e.g., “Missense”, “Nonsense”, “Splice site”) differs between primary vs. metastatic cell lines.

mutation_calls_df %>%
  ggplot(aes(x = var_class, fill = primary_or_metastasis)) +
    geom_bar(position = "dodge") +
    labs(x = "Mutation Type", y = "Number of Cell Lines") +
    theme_classic() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    ggtitle("Barplot of absolute numbers of annotated mutations of m6A genes\nin Depmap breast cancer cell lines")

The following balloon plot displays the same information:

Interpretation: We observe that missense mutations of m6A-related genes dominate within Depmap breast cancer cell lines.

table(mutation_calls_df$var_class, mutation_calls_df$primary_or_metastasis) %>%
  as.data.frame() %>% 
  ggballoonplot(fill = "value") +
    scale_fill_gradientn(colors = my_cols) +
    ggtitle("Depmap breast cancer m6A-related gene mutation type\nby metastatic status")

We create a balloon plot to see how mutation classification (e.g., “Missense”, “Nonsense”, “Splice site”) differs between primary vs. metastatic cell lines.

Interpretation: In line with the Zheng, et al analysis of the TCGA data, we observe the mutation of VIRMA/KIAA1429 to be the most commonly-mutated m6A-related gene across Depmap breast cancer cell lines.

table(mutation_calls_df$var_class, mutation_calls_df$gene_name) %>%
  as.data.frame() %>% 
  ggballoonplot(fill = "value") +
    scale_fill_gradientn(colors = my_cols) +
    ggtitle("Depmap breast cancer m6A-related gene mutation type\nby gene")

Interpretation: We observe missense mutations of m6A-related genes to be very elevated within HER2-positive, Depmap breast cancer cell lines, regardless of ER status.

table(mutation_calls_df$var_class, mutation_calls_df$lineage_sub_subtype) %>%
  as.data.frame() %>% 
  ggballoonplot(fill = "value") +
    scale_fill_gradientn(colors = my_cols) +
    ggtitle("Depmap breast cancer m6A-related gene mutation type\nby ER and HER2 status")

A balloon plot that partitions each bar by the m6A-related gene mutated, so you can see which genes tend to have which var_class.

Interpretation: Curiously, we observe that mutations of m6A-related genes largely appear to be annotated as non-damaging within Depmap breast cancer cell lines by the COSMIC consortium.

table(mutation_calls_df$var_annotation, mutation_calls_df$primary_or_metastasis) %>%
  as.data.frame() %>% 
  ggballoonplot(fill = "value") +
    scale_fill_gradientn(colors = my_cols) +
    ggtitle("Depmap breast cancer m6A-related gene mutation annotation\nby metastatic status")

We plot the variant annotation to see how it splits between primary vs metastatic lines.

Interpretation: Once again, in line with the Zheng, et al analysis of the TCGA data, we observe the mutation of VIRMA/KIAA1429 to be the most commonly-mutated m6A-related gene across Depmap breast cancer cell lines, but this mutation is not annotated as “damaging.”

table(mutation_calls_df$var_annotation, mutation_calls_df$gene_name) %>%
  as.data.frame() %>% 
  ggballoonplot(fill = "value") +
    scale_fill_gradientn(colors = my_cols) +
    ggtitle("Depmap breast cancer m6A-related gene mutation annotations\nby gene")

We create a custom column mutation_type that concatenates ref_allele and alt_allele (e.g. C_to_T) to highlight known mutational signatures. The bar chart categorizes them by primary vs. metastatic. This often reveals a high frequency of UV-associated C>T transitions in breast cancer.

Interpretation: While we do not observe any single mutation signature genes in primary tumor-derived Depmap breast cancer cell lines, curiously C to T and G to A transitions appear to dominate in m6A-related genes in metastatic tumor-derived Depmap breast cancer cell lines. C to T (and symmetrically G to A) transitions at dipyrimidine sites are the prototypical “UV mutational signature.” There is a strong etiological of melanoma with UV radiation, however this seems unlikely given the lack of UV exposure in breast tissue. APOBEC enzymes (in particular APOBEC3B) are cytidine deaminases that catalyze the deamination of cytosine to uracil, leading ultimately to C to T and G to A mutations. APOBEC hyperactivity in metastasis selecting for tumor subclones with defects in DNA repair and/or elevated APOBEC expression, thus amplifying C to T transitions is a more likely mechanism. Another possibility could be that metastatic progression can come with selective pressure for epigenetic dysregulation. If m6A-related genes contain regulatory or coding regions with enriched CpG sites, these might be especially susceptible to 5mC to T changes in more advanced, unstable genomes.

mutation_calls_df %>%
  dplyr::mutate(mutation_type = paste0(ref_allele, "_to_", alt_allele)) %>%
  dplyr::filter(stringr::str_length(mutation_type) < 8) %>%
  as.data.frame() -> mutation_calls_df2

table(mutation_calls_df2$mutation_type, mutation_calls_df2$primary_or_metastasis) %>% 
  as.data.frame() %>% 
  ggballoonplot(fill = "value") +
    scale_fill_gradientn(colors = my_cols) +
    ggtitle("Depmap breast cancer m6A-related gene mutation transitions\nby metastatic status")

We create a custom column mutation_type that concatenates ref_allele and alt_allele (e.g. C_to_T) to highlight known mutational signatures. The bar chart categorizes them by gene.

Interpretation: No mutation appears to dominate strongly, and there is no dominant mutational pattern of VIRMA/KIAA1429, in particular.

mutation_calls_df %>%
  dplyr::mutate(mutation_type = paste0(ref_allele, "_to_", alt_allele)) %>%
  dplyr::filter(stringr::str_length(mutation_type) < 8) %>%
  as.data.frame() -> mutation_calls_df2

table(mutation_calls_df2$mutation_type, mutation_calls_df2$gene_name) %>% 
  as.data.frame() %>% 
  ggballoonplot(fill = "value") +
    scale_fill_gradientn(colors = my_cols) +
    ggtitle("Depmap breast cancer m6A-related gene mutation transitions\nby gene")

Interpretation: Within primary breast cancer samples, we do not observe any pattern of mutation of m6A genes.

mutation_calls_df %>%
  dplyr::mutate(mutation_type = paste0(ref_allele, "_to_", alt_allele)) %>%
  dplyr::filter(stringr::str_length(mutation_type) < 8,
                primary_or_metastasis == "Primary") %>%
  as.data.frame() -> mutation_calls_df2

table(mutation_calls_df2$mutation_type, mutation_calls_df2$gene_name) %>% 
  as.data.frame() %>% 
  ggballoonplot(fill = "value") +
    scale_fill_gradientn(colors = my_cols) +
    ggtitle("Depmap breast cancer m6A-related gene mutation transitions\nby gene within primary tumor samples only")

Interpretation: Within metastatic breast cancer samples, we observe mutation of RMB15 to be slightly more common than other mutations.

mutation_calls_df %>%
  dplyr::mutate(mutation_type = paste0(ref_allele, "_to_", alt_allele)) %>%
  dplyr::filter(stringr::str_length(mutation_type) < 8,
                primary_or_metastasis == "Metastasis") %>%
  as.data.frame() -> mutation_calls_df2

table(mutation_calls_df2$mutation_type, mutation_calls_df2$gene_name) %>% 
  as.data.frame() %>% 
  ggballoonplot(fill = "value") +
    scale_fill_gradientn(colors = my_cols) +
    ggtitle("Depmap breast cancer m6A-related gene mutation transitions\nby gene within metastatic tumor samples only")

Interpretation: Within primary breast cancer samples, we observe that C to T and G to A transitions are elevated for ER-neg and HER2-neg tumor cell lines.

mutation_calls_df %>%
  dplyr::mutate(mutation_type = paste0(ref_allele, "_to_", alt_allele)) %>%
  dplyr::filter(stringr::str_length(mutation_type) < 8) %>%
  as.data.frame() -> mutation_calls_df2

table(mutation_calls_df2$mutation_type, mutation_calls_df2$lineage_sub_subtype) %>% 
  as.data.frame() %>%
  ggballoonplot(fill = "value") +
    scale_fill_gradientn(colors = my_cols) +
    ggtitle("Depmap breast cancer m6A-related gene mutation transitions\nby gene by ER and HER2 status")

Because we would like to know what genes are mutated in this way, we select only ER-neg and HER2-neg tumor samples, and observe gene mutations below:

Interpretation: We observe that C to T and G to A transitions are elevated for ER-neg and HER2-neg tumor cell lines, but only for the gene RBM15 do such mutations it occur more than once in our dataset.

mutation_calls_df %>%
  dplyr::mutate(mutation_type = paste0(ref_allele, "_to_", alt_allele)) %>%
  dplyr::filter(stringr::str_length(mutation_type) < 8,
                lineage_sub_subtype == "ERneg_HER2neg") %>%
  as.data.frame() -> mutation_calls_df3

table(mutation_calls_df3$mutation_type, mutation_calls_df3$gene_name) %>% 
  as.data.frame() %>% 
  ggballoonplot(fill = "value") +
    scale_fill_gradientn(colors = my_cols) +
    ggtitle("Depmap breast cancer m6A-related gene mutation transitions\nby gene in ERneg HER2neg cell lines")

Conclusion and Next Steps

This exploration underscores the utility of DepMap data and R-based visualization workflows to generate hypotheses about m6A-related gene roles in breast cancer cell-line survival, copy number changes, associated mutations. Some potential next steps might include:

  1. Cross-referencing m6A-informed RNA-binding regions for m6A-related genes implicated in breast cancer, such as RMB15
  2. Perform differential expression analysis between Depmap cancer samples by ER-status
  3. Perform differential methylatiob analysis between Depmap cancer samples by ER-status
sessionInfo()
#> R version 4.4.3 (2025-02-28)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Brussels
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] DT_0.33              pheatmap_1.0.12      plotly_4.10.4       
#>  [4] biomaRt_2.62.1       ExperimentHub_2.14.0 AnnotationHub_3.14.0
#>  [7] BiocFileCache_2.14.0 dbplyr_2.5.0         BiocGenerics_0.52.0 
#> [10] depmap_1.20.0        ggpubr_0.6.0         ggrepel_0.9.6       
#> [13] stringr_1.5.1        viridis_0.6.5        viridisLite_0.4.2   
#> [16] ggplot2_3.5.1        tibble_3.2.1         tidyr_1.3.1         
#> [19] dplyr_1.1.4         
#> 
#> loaded via a namespace (and not attached):
#>  [1] DBI_1.2.3               gridExtra_2.3           httr2_1.1.1            
#>  [4] rlang_1.1.5             magrittr_2.0.3          compiler_4.4.3         
#>  [7] RSQLite_2.3.9           mgcv_1.9-1              png_0.1-8              
#> [10] vctrs_0.6.5             pkgconfig_2.0.3         crayon_1.5.3           
#> [13] fastmap_1.2.0           backports_1.5.0         XVector_0.46.0         
#> [16] labeling_0.4.3          rmarkdown_2.29          tzdb_0.5.0             
#> [19] UCSC.utils_1.2.0        purrr_1.0.4             bit_4.6.0              
#> [22] xfun_0.51               zlibbioc_1.52.0         cachem_1.1.0           
#> [25] GenomeInfoDb_1.42.3     jsonlite_1.9.1          progress_1.2.3         
#> [28] blob_1.2.4              parallel_4.4.3          broom_1.0.7            
#> [31] prettyunits_1.2.0       R6_2.6.1                bslib_0.9.0            
#> [34] stringi_1.8.4           RColorBrewer_1.1-3      car_3.1-3              
#> [37] jquerylib_0.1.4         Rcpp_1.0.14             knitr_1.50             
#> [40] readr_2.1.5             IRanges_2.40.1          Matrix_1.7-3           
#> [43] splines_4.4.3           tidyselect_1.2.1        rstudioapi_0.17.1      
#> [46] abind_1.4-8             yaml_2.3.10             curl_6.2.1             
#> [49] lattice_0.22-6          Biobase_2.66.0          withr_3.0.2            
#> [52] KEGGREST_1.46.0         evaluate_1.0.3          xml2_1.3.8             
#> [55] Biostrings_2.74.1       pillar_1.10.1           BiocManager_1.30.25    
#> [58] filelock_1.0.3          carData_3.0-5           stats4_4.4.3           
#> [61] generics_0.1.3          vroom_1.6.5             BiocVersion_3.20.0     
#> [64] S4Vectors_0.44.0        hms_1.1.3               munsell_0.5.1          
#> [67] scales_1.3.0            glue_1.8.0              lazyeval_0.2.2         
#> [70] tools_4.4.3             data.table_1.17.0       ggsignif_0.6.4         
#> [73] grid_4.4.3              crosstalk_1.2.1         AnnotationDbi_1.68.0   
#> [76] colorspace_2.1-1        nlme_3.1-167            GenomeInfoDbData_1.2.13
#> [79] Formula_1.2-5           cli_3.6.4               rappdirs_0.3.3         
#> [82] gtable_0.3.6            rstatix_0.7.2           sass_0.4.9             
#> [85] digest_0.6.37           farver_2.1.2            htmlwidgets_1.6.4      
#> [88] memoise_2.0.1           htmltools_0.5.8.1       lifecycle_1.0.4        
#> [91] httr_1.4.7              mime_0.12               bit64_4.6.0-1